AITopics | Leiden

Accessing and gaining insight into the Rigveda poses a non-trivial challenge due to its extremely ancient Sanskrit language, poetic structure, and large volume of text. By using NLP techniques, this study identified topics and semantic connections of hymns within the Rigveda that were corroborated by seven well-known groupings of hymns. The 1,028 suktas (hymns) from the modern English translation of the Rigveda by Jamison and Brereton were preprocessed and sukta-level embeddings were obtained using, i) a novel adaptation of LSA, presented herein, ii) SBERT, and iii) Doc2Vec embeddings. Following an UMAP dimension reduction of the vectors, the network of suktas was formed using k-nearest neighbours. Then, community detection of topics in the sukta networks was performed with the Louvain, Leiden, and label propagation methods, whose statistical significance of the formed topics were determined using an appropriate null distribution. Only the novel adaptation of LSA using the Leiden method, had detected sukta topic networks that were significant (z = 2.726, p < .01) with a modularity score of 0.944. Of the seven famous sukta groupings analyzed (e.g., creation, funeral, water, etc.) the LSA derived network was successful in all seven cases, while Doc2Vec was not significant and failed to detect the relevant suktas. SBERT detected four of the famous suktas as separate groups, but mistakenly combined three of them into a single mixed group. Also, the SBERT network was not statistically significant.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.18226

Country:

North America > United States (0.47)
Europe > Netherlands > South Holland > Leiden (0.45)
North America > Mexico > Mexico City (0.14)
(2 more...)

Genre: Research Report > Experimental Study (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Community detection using fast low-cardinality semidefinite programming

Neural Information Processing SystemsMar-18-2025, 10:48:25 GMT

Modularity maximization has been a fundamental tool for understanding the community structure of a network, but the underlying optimization problem is nonconvex and NP-hard to solve. State-of-the-art algorithms like the Louvain or Leiden methods focus on different heuristics to help escape local optima, but they still depend on a greedy step that moves node assignment locally and is prone to getting trapped. In this paper, we propose a new class of low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from max-k-cut. This proposed algorithm is scalable, empirically achieves the global semidefinite optimality for small cases, and outperforms the state-of-the-art algorithms in real-world datasets with little additional time cost. From the algorithmic perspective, it also opens a new avenue for scaling-up semidefinite programming when the solutions are sparse instead of low-rank.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
Europe > Netherlands > South Holland > Leiden (0.30)

Genre: Research Report (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Exploring Open-world Continual Learning with Knowns-Unknowns Knowledge Transfer

Li, Yujie, Lai, Guannan, Yang, Xin, Li, Yonghao, Bonsangue, Marcello, Li, Tianrui

arXiv.org Artificial IntelligenceFeb-27-2025

--Open-World Continual Learning (OWCL) is a challenging paradigm where models must incrementally learn new knowledge without forgetting while operating under an open-world assumption. This requires handling incomplete training data and recognizing unknown samples during inference. However, existing OWCL methods often treat open detection and continual learning as separate tasks, limiting their ability to integrate open-set detection and incremental classification in OWCL. Moreover, current approaches primarily focus on transferring knowledge from known samples, neglecting the insights derived from unknown/open samples. T o address these limitations, we formalize four distinct OWCL scenarios and conduct comprehensive empirical experiments to explore potential challenges in OWCL. Our findings reveal a significant interplay between the open detection of unknowns and incremental classification of knowns, challenging a widely held assumption that unknown detection and known classification are orthogonal processes. Building on our insights, we propose HoliTrans (Holistic Knowns-Unknowns Knowledge Transfer), a novel OWCL framework that integrates nonlinear random projection (NRP) to create a more linearly separable embedding space and distribution-aware prototypes (DAPs) to construct an adaptive knowledge space. Particularly, our HoliTrans effectively supports knowledge transfer for both known and unknown samples while dynamically updating representations of open samples during OWCL. Extensive experiments across various OWCL scenarios demonstrate that HoliTrans outperforms 22 competitive baselines, bridging the gap between OWCL theory and practice and providing a robust, scalable framework for advancing open-world learning paradigms. Open-World Continual Learning (OWCL) [1], [2] represents a highly practical yet profoundly challenging machine learning paradigm. In OWCL, a model must continually adapt to an unbounded sequence of tasks in a dynamic open environment [3], [4], where novelties might emerge in testing unpredictably over time [5]-[7]. Xin Y ang is the corresponding author (yangxin@swufe.edu.cn). Y ujie Li, Guannan Lai, Xin Y ang and Y onghao Li are with the Southwestern University of Finance and Economics, China (E-mail: liyj1201@gmail.com, Y ujie Li and Marcello Bonsangue are with the Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Netherlands (E-mail: liyj1201@gmail.com, Tianrui Li is with the School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China (e-mail: trli@swjtu.edu.cn). Manuscript received XX XX, 2025; revised XX XX, 2025.

artificial intelligence, machine learning, scenario, (18 more...)

arXiv.org Artificial Intelligence

2502.20124

Country:

Europe > Netherlands > South Holland > Leiden (0.44)
Asia > China > Sichuan Province (0.34)

Genre: Research Report > New Finding (0.48)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions

Li, Zhong, Huang, Qi, Yang, Lincen, Shi, Jiayang, Yang, Zhao, van Stein, Niki, Bäck, Thomas, van Leeuwen, Matthijs

arXiv.org Artificial IntelligenceFeb-24-2025

In recent years, generative models have achieved remarkable performance across diverse applications, including image generation, text synthesis, audio creation, video generation, and data augmentation. Diffusion models have emerged as superior alternatives to Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) by addressing their limitations, such as training instability, mode collapse, and poor representation of multimodal distributions. This success has spurred widespread research interest. In the domain of tabular data, diffusion models have begun to showcase similar advantages over GANs and VAEs, achieving significant performance breakthroughs and demonstrating their potential for addressing unique challenges in tabular data modeling. However, while domains like images and time series have numerous surveys summarizing advancements in diffusion models, there remains a notable gap in the literature for tabular data. Despite the increasing interest in diffusion models for tabular data, there has been little effort to systematically review and summarize these developments. This lack of a dedicated survey limits a clear understanding of the challenges, progress, and future directions in this critical area. This survey addresses this gap by providing a comprehensive review of diffusion models for tabular data. Covering works from June 2015, when diffusion models emerged, to December 2024, we analyze nearly all relevant studies, with updates maintained in a \href{https://github.com/Diffusion-Model-Leiden/awesome-diffusion-models-for-tabular-data}{GitHub repository}. Assuming readers possess foundational knowledge of statistics and diffusion models, we employ mathematical formulations to deliver a rigorous and detailed review, aiming to promote developments in this emerging and exciting area.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.17119

Country:

Europe > Netherlands > South Holland > Leiden (0.24)
North America > United States > California (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Community detection using fast low-cardinality semidefinite programming

Neural Information Processing SystemsJan-22-2025, 15:09:10 GMT

Modularity maximization has been a fundamental tool for understanding the community structure of a network, but the underlying optimization problem is nonconvex and NP-hard to solve. State-of-the-art algorithms like the Louvain or Leiden methods focus on different heuristics to help escape local optima, but they still depend on a greedy step that moves node assignment locally and is prone to getting trapped. In this paper, we propose a new class of low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from max-k-cut. This proposed algorithm is scalable, empirically achieves the global semidefinite optimality for small cases, and outperforms the state-of-the-art algorithms in real-world datasets with little additional time cost. From the algorithmic perspective, it also opens a new avenue for scaling-up semidefinite programming when the solutions are sparse instead of low-rank.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
Europe > Netherlands > South Holland > Leiden (0.30)

Genre: Research Report (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MaLei at the PLABA Track of TAC-2024: RoBERTa for Task 1 -- LLaMA3.1 and GPT-4o for Task 2

Ling, Zhidong, Li, Zihao, Romero, Pablo, Han, Lifeng, Nenadic, Goran

arXiv.org Artificial IntelligenceDec-31-2024

This report is the system description of the MaLei team (Manchester and Leiden) for shared task Plain Language Adaptation of Biomedical Abstracts (PLABA) 2024 (we had an earlier name BeeManc following last year). This report contains two sections corresponding to the two sub-tasks in PLABA 2024. In task one, we applied fine-tuned ReBERTa-Base models to identify and classify the difficult terms, jargon and acronyms in the biomedical abstracts and reported the F1 score. Due to time constraints, we didn't finish the replacement task. In task two, we leveraged Llamma3.1-70B-Instruct and GPT-4o with the one-shot prompts to complete the abstract adaptation and reported the scores in BLEU, SARI, BERTScore, LENS, and SALSA. From the official Evaluation from PLABA-2024 on Task 1A and 1B, our \textbf{much smaller fine-tuned RoBERTa-Base} model ranked 3rd and 2nd respectively on the two sub-task, and the \textbf{1st on averaged F1 scores across the two tasks} from 9 evaluated systems. Our LLaMA-3.1-70B-instructed model achieved the \textbf{highest Completeness} score for Task-2. We share our fine-tuned models and related resources at \url{https://github.com/HECTA-UoM/PLABA2024}

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.07381

Country:

North America > United States > Minnesota (0.28)
Europe > Netherlands > South Holland > Leiden (0.24)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hypergraph Neural Networks Reveal Spatial Domains from Single-cell Transcriptomics Data

Soltani, Mehrad, Rueda, Luis

arXiv.org Artificial IntelligenceOct-23-2024

The task of spatial clustering of transcriptomics data is of paramount importance. It enables the classification of tissue samples into diverse subpopulations of cells, which, in turn, facilitates the analysis of the biological functions of clusters, tissue reconstruction, and cell-cell interactions. Many approaches leverage gene expressions, spatial locations, and histological images to detect spatial domains; however, Graph Neural Networks (GNNs) as state of the art models suffer from a limitation in the assumption of pairwise connections between nodes. In the case of domain detection in spatial transcriptomics, some cells are found to be not directly related. Still, they are grouped as the same domain, which shows the incapability of GNNs for capturing implicit connections among the cells. While graph edges connect only two nodes, hyperedges connect an arbitrary number of nodes along their edges, which lets Hypergraph Neural Networks (HGNNs) capture and utilize richer and more complex structural information than traditional GNNs. We use autoencoders to address the limitation of not having the actual labels, which are well-suited for unsupervised learning. Our model has demonstrated exceptional performance, achieving the highest iLISI score of 1.843 compared to other methods. This score indicates the greatest diversity of cell types identified by our method. Furthermore, our model outperforms other methods in downstream clustering, achieving the highest ARI values of 0.51 and Leiden score of 0.60.

artificial intelligence, machine learning, spatial transcriptomic, (14 more...)

arXiv.org Artificial Intelligence

2410.19868

Country:

North America > Canada (0.28)
Europe > Netherlands > South Holland > Leiden (0.25)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MNIST-Nd: a set of naturalistic datasets to benchmark clustering across dimensions

Turishcheva, Polina, Hansel, Laura, Ritzert, Martin, Weis, Marissa A., Ecker, Alexander S.

arXiv.org Machine LearningOct-21-2024

Driven by advances in recording technology, large-scale high-dimensional datasets have emerged across many scientific disciplines. Especially in biology, clustering is often used to gain insights into the structure of such datasets, for instance to understand the organization of different cell types. However, clustering is known to scale poorly to high dimensions, even though the exact impact of dimensionality is unclear as current benchmark datasets are mostly two-dimensional. Here we propose MNIST-Nd, a set of synthetic datasets that share a key property of real-world datasets, namely that individual samples are noisy and clusters do not perfectly separate. MNIST-Nd is obtained by training mixture variational autoencoders with 2 to 64 latent dimensions on MNIST, resulting in six datasets with comparable structure but varying dimensionality. It thus offers the chance to disentangle the impact of dimensionality on clustering. Preliminary common clustering algorithm benchmarks on MNIST-Nd suggest that Leiden is the most robust for growing dimensions.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Machine Learning

2410.16124

Country:

Europe > Netherlands > South Holland > Leiden (0.26)
Europe > Germany > Lower Saxony (0.15)

Genre: Research Report (0.40)

Industry: Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Community detection using fast low-cardinality semidefinite programming

Neural Information Processing SystemsOct-9-2024, 18:30:07 GMT

Modularity maximization has been a fundamental tool for understanding the community structure of a network, but the underlying optimization problem is nonconvex and NP-hard to solve. State-of-the-art algorithms like the Louvain or Leiden methods focus on different heuristics to help escape local optima, but they still depend on a greedy step that moves node assignment locally and is prone to getting trapped. In this paper, we propose a new class of low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from max-k-cut. This proposed algorithm is scalable, empirically achieves the global semidefinite optimality for small cases, and outperforms the state-of-the-art algorithms in real-world datasets with little additional time cost. From the algorithmic perspective, it also opens a new avenue for scaling-up semidefinite programming when the solutions are sparse instead of low-rank.

artificial intelligence, machine learning, optimization problem, (3 more...)

Neural Information Processing Systems

Country: Europe > Netherlands > South Holland > Leiden (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning (0.46)
Information Technology > Data Science > Data Mining (0.40)

Add feedback

Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis

Watteau, Timothé, Bonnefoy, Aubin, Illouz-Laurent, Simon, Jusseau, Joaquim, Iovleff, Serge

arXiv.org Machine LearningJul-12-2024

Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background section covers essential topics, including graph Laplacians and the integration of Deep Learning in graph analysis. The paper then delves into traditional clustering methods, including Spectral Clustering and the Leiden algorithm. Following this, state-of-the-art clustering techniques that leverage deep learning are examined. A comprehensive comparison of these methods is made through experiments. The paper concludes with a discussion of the practical applications of graph clustering and potential future research directions.

artificial intelligence, graph, machine learning, (13 more...)

arXiv.org Machine Learning

2407.09055

Country:

Europe > Netherlands > South Holland > Leiden (0.25)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology (0.66)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Filters

Collaborating Authors

Leiden

Mapping Hymns and Organizing Concepts in the Rigveda: Quantitatively Connecting the Vedic Suktas

Community detection using fast low-cardinality semidefinite programming

Exploring Open-world Continual Learning with Knowns-Unknowns Knowledge Transfer

Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions

Community detection using fast low-cardinality semidefinite programming

MaLei at the PLABA Track of TAC-2024: RoBERTa for Task 1 -- LLaMA3.1 and GPT-4o for Task 2

Hypergraph Neural Networks Reveal Spatial Domains from Single-cell Transcriptomics Data

MNIST-Nd: a set of naturalistic datasets to benchmark clustering across dimensions

Community detection using fast low-cardinality semidefinite programming

Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis